Viterbi decoding for latent words language models using gibbs sampling
نویسندگان
چکیده
This paper introduces a new approach that directly uses latent words language models (LWLMs) in automatic speech recognition (ASR). LWLMs are effective against data sparseness because of their soft-decision clustering structure and Bayesian modeling so it can be expected that LWLMs perform robustly in multiple ASR tasks. Unfortunately, implementing a LWLM to ASR is difficult because of its computation complexity. In our previous work, we implemented an approximate LWLM for ASR by sampling words according to a stochastic process and training a word n-gram LMs. However, the previous approach cannot take into account the latent variable sequence behind the recognition hypothesis. To solve this problem, we propose a method based on Viterbi decoding that simultaneously decodes the recognition hypothesis and its latent variable sequence. In the proposed method, we use Gibbs sampling for rapid decoding. Our experiments show the effectiveness of the proposed Viterbi decoding based on n-best rescoring. Moreover, we also investigate the effects on the combination of the previous approximate LWLM and the proposed Viterbi decoding.
منابع مشابه
Decoding Running Key Ciphers
There has been recent interest in the problem of decoding letter substitution ciphers using techniques inspired by natural language processing. We consider a different type of classical encoding scheme known as the running key cipher, and propose a search solution using Gibbs sampling with a word language model. We evaluate our method on synthetic ciphertexts of different lengths, and find that...
متن کاملIncorporating Non-local Information into Information Extraction Systems by Gibbs Sampling
Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic model...
متن کاملThe Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data
The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...
متن کاملA Latent Dirichlet Framework for Relevance Modeling
Relevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. This could limit model robustness and effectiveness. In this study, we propose a Latent Dirichlet relevance model, whic...
متن کاملLatent Variable Perceptron Algorithm for Structured Classification
We propose a perceptron-style algorithm for fast discriminative training of structured latent variable model, and analyzed its convergence properties. Our method extends the perceptron algorithm for the learning task with latent dependencies, which may not be captured by traditional models. It relies on Viterbi decoding over latent variables, combined with simple additive updates. Compared to e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013